A Unified Approach to Speculative Parallelization of Loops in DSM Multiprocessors
نویسندگان
چکیده
Speculative parallel execution of statically non-analyzable codes on Distributed Shared-Memory (DSM) multiprocessors is challenging because of the long latency and memory distribution present. However, such an approach may well be the best way of speeding up codes whose dependences can not be compiler analyzed. In this paper, we have extended past work by proposing a hardware scheme for the speculative parallel execution of loops that have a modest number of cross-iteration dependences. In this case, when a dependence violation is detected, we locally repair the state. Then, depending on the situation, we either re-execute one out-of-order iteration or, restart parallel execution from that point on. The general algorithm, called the Unified Privatization and Reduction algorithm (UPAR), privatizes, on demand, at cache-line level, executes reductions in parallel, merges the last values and partial results of reductions on-the-fly with minimum residual work at loop end. UPAR allows for completely dynamic scheduling and does not get slowed down if the working set of an iteration is larger than the cache size. Simulations indicate good speedups relative to sequential execution. The hardware support for reduction optimizations brings, on average, 50% performance improvement and can be used both in speculative and normal execution.
منابع مشابه
Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors
Run-time parallelization is often the only way to execute the code in parallel when data dependence information is incomplete at compile time. This situation is common in many important applications. Unfortunately, known techniques for run-time parallelization are often computationally expensive or not general enough. To address this problem, we propose new hardware support for e cient run-time...
متن کاملSpeculative Parallel Execution of Loops with Cross-Iteration Dependences in DSM Multiprocessors
Speculative parallel execution of non-analyzable codes on Distributed Shared-Memory (DSM) multiprocessors is challenging due to the long-latency and distribution involved. However , such an approach may well be the best way of speeding up codes whose dependences can not be compiler analyzed. In previous work, we suggested executing the loop speculatively in parallel and adding extensions to the...
متن کاملA Feasibility Study of Hardware Speculative Parallelization in Snoop-Based Multiprocessors
Run-time parallelization is a technique for par-allelizing programs with data access patterns dif-cult to analyze at compile time. In this paper we examine the hardware implementation of a run-time parallelization scheme, called speculative parallelization, on snoop-based multiproces-sors. The implementation is based on the idea of embedding dependence checking logic into the cache controller o...
متن کاملWhen All Else Fails, Guess: The Use of Speculative Multithreading for High-Performance Computing
| Fundamental physical limits are being encountered in the design of integrated circuits that will limit future increases in processor clock rates. As a result, computer architects are developing aggressive new mechanisms to execute instructions speculatively, that is, before it is known whether or not they should actually be executed, and even before the input values needed by the instructions...
متن کاملTechniques for Module - Level Speculative Parallelization on Shared - Memory Multiprocessors Research Proposal
Multiprocessors have hit the mainstream and cover the whole spectrum of computational needs from small-scale symmetric multiprocessors to scalable distributed shared-memory systems with a few hundred processors. This has made it possible to boost the performance of a number of important applications from the numeric and database domain. Extending the scope of applications that can take advantag...
متن کامل